Compositional balance and color driven content retrieval

ABSTRACT

For each image in a collection of images, a respective model of visual weight in the image and a respective model of color in the image are determined. An image query is generated from a target visual weight distribution and a target color template. For each of the images a respective score is calculated from the image query, the respective visual weight model, and the respective color model. At least one of the images is retrieved from a database based on the respective scores.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application relates to the following co-pending applications, eachof which is incorporated herein by reference:

U.S. patent application Ser. No. 11/496,146, filed Jul. 31, 2006;

U.S. patent application Ser. No. 11/495,846, filed Jul. 27, 2006;

U.S. patent application Ser. No. 11/495,847, filed Jul. 27, 2006;

U.S. patent application Ser. No. 11/127,278, filed May 12, 2005; and

U.S. patent application Ser. No. 11/259,597, filed Oct. 25, 2005

BACKGROUND

Individuals and organizations are rapidly accumulating large collectionsof digital content, including text, audio, graphics, animated graphicsand full-motion video. This content may be presented individually orcombined in a wide variety of different forms, including documents,presentations, still photographs, commercial videos, home movies, andmeta data describing one or more associated digital content files. Asthese collections grow in number and diversity, individuals andorganizations increasingly will require systems and methods forretrieving the digital content from their collections.

Among the ways that commonly are used to retrieve digital content from acollection are browsing methods and text-based retrieval methods.Browsing methods involve manually scanning through the content in thecollection. Browsing, however, tends to be an inefficient way toretrieve content and typically is useful only for small contentcollections. Text-based retrieval methods involve submitting queries toa text-based search engine that matches the query terms to textualmetadata that is associated with the content. Text-based retrievalmethods typically rely on the association of manual annotations to thecontent, which requires a significant amount of manual time and effort.

Content-based retrieval methods also have been developed for retrievingcontent based on the actual attributes of the content. Content-basedretrieval methods involve submitting a description of the desiredcontent to a content-based search engine, which translates thedescription into a query and matches the query to one or more parametersthat are associated with the content. Some content-based retrievalsystems support query-by-text, which involves matching query terms todescriptive textual metadata associated with the content. Othercontent-based retrieval systems additionally support query-by-content,which involves interpreting a query that describes the content in termsof attributes such as color, shape, and texture, abstractions such asobjects, roles, and scenes, and subjective impressions, emotions, andmeanings that are assigned to the content attributes. In somecontent-based image retrieval approaches, low level visual features areused to group images into meaningful categories that, in turn, are usedto generate indices for a database containing the images. Exemplary lowlevel features include texture, shape, and layout. The parameters (orterms) of an image query may be used to retrieve images in the databasesthat have indices that match the conditions in the image query. Ingeneral, the results of automatic categorization and indexing of imagesimprove when the features that are used to categorize and index imagesaccurately capture the features that are of interest to the personsubmitting the image queries.

A primary challenge in the design of a content-based retrieval systeminvolves identifying meaningful attributes that can be extracted fromthe content and used to rank the content in accordance with the degreeof relevance to a particular retrieval objective.

SUMMARY

In one aspect, the invention features a method in accordance with whichfor each image in a collection of images a respective model of visualweight in the image and a respective model of color in the image aredetermined. An image query is generated from a target visual weightdistribution and a target color template. For each of the images arespective score is calculated from the image query, the respectivevisual weight model, and the respective color model. At least one of theimages is retrieved from a database based on the respective scores.

The invention also features apparatus and machine readable media storingmachine-readable instructions for implementing the method describedabove.

Other features and advantages of the invention will become apparent fromthe following description, including the drawings and the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of an embodiment of a compositional balanceand color driven content retrieval system.

FIG. 2 is a flow diagram of an embodiment of a compositional balance andcolor driven content retrieval method.

FIG. 3A is a diagrammatic view of a document that has a left-rightsymmetrical balance distribution of constituent objects.

FIG. 3B is a diagrammatic view of a document showing the visual centerof the document and the true center of the document.

FIG. 3C is a diagrammatic view of a document that has a centeredsymmetrical balance distribution of constituent objects.

FIG. 4 a diagrammatic view of an exemplary color wheel.

FIG. 5 is a block diagram of an embodiment of a method of segmenting animage.

FIG. 6 is a flow diagram of an embodiment of a method of constructing avisual weight model of an image from a visual appeal map.

FIG. 7 is a diagrammatic view of various maps that are calculated inaccordance with an embodiment of the method of FIG. 6.

FIG. 8 is a flow diagram of an embodiment of a method of producing avisual appeal map of an image.

FIG. 9 is a flow diagram of an embodiment of a method of producing asharpness map of an image.

FIG. 10 is a diagrammatic view of various maps that are calculated inaccordance with an embodiment of the method if FIG. 9.

FIG. 11 is a flow diagram of an embodiment of a method of producing amodel of visual weight in an image from a visual appeal map of theimage.

FIG. 12 is a diagrammatic view of various maps that are calculated inaccordance with an embodiment of the method of FIG. 11.

FIG. 13 is a flow diagram of an embodiment of a method of producing amodel of color for an image.

FIG. 14 is a flow diagram of an embodiment of a method by which themodeling engine 12 models the regions into which the input image issegmented

FIG. 15 is a flow diagram of an embodiment of a method by which themodeling engine 12 produces a respective color model from the respectiveregions that are modeled in the input image

FIG. 16A shows a segmented image that was produced from an exemplaryinput image in accordance with the color segmentation process of FIG. 5.

FIG. 16B shows a representation of a color model that was produced fromthe segmented image of FIG. 16B in accordance with the method of FIG.13.

FIG. 17 is a flow diagram of an embodiment of a method of generating animage query.

FIG. 18 is a block diagram of an embodiment of a system for generatingan image query from a document.

FIG. 19 is a flow diagram of an embodiment of a method of generating atarget visual weight distribution from a document.

FIG. 20 is a diagrammatic view of a document that has a plurality ofobjects arranged in a compositional layout.

FIG. 21 is a diagrammatic view of an embodiment of a model of visualweight in the document shown in FIG. 20.

FIG. 22 is a diagrammatic view of a reflection of the visual weightmodel of FIG. 21 about a central vertical axis of the document shown inFIG. 20.

FIG. 23 is a flow diagram of an embodiment of a method of constructingthe target color template from a document.

FIGS. 24A-24C show different color maps that are produced from thedocument of FIG. 20 in accordance with the method of FIG. 23.

FIGS. 25A and 25B are diagrammatic views of an embodiment of a userinterface for specifying a visual weight distribution.

FIG. 26 is a diagrammatic view of the image color model of FIG. 16Bpositioned in a specific document location in relation to the documentcolor model of FIG. 24C.

FIG. 27 is a graph illustrating threshold values that are used to adjustthe image scores for extreme images in which either the visual weightquality or the color quality is below an empirically determinedacceptable level.

FIG. 28 is a graph showing three different precision-recall curves.

FIG. 29 is a block diagram of an embodiment of a computer system thatimplements an embodiment of the compositional balance and color drivencontent retrieval system of FIG. 1.

DETAILED DESCRIPTION

In the following description, like reference numbers are used toidentify like elements. Furthermore, the drawings are intended toillustrate major features of exemplary embodiments in a diagrammaticmanner. The drawings are not intended to depict every feature of actualembodiments nor relative dimensions of the depicted elements, and arenot drawn to scale.

I. Introduction

The embodiments that are described in detail herein are capable ofretrieving images (e.g., digital photographs, video frames, scanneddocuments, and other image-based graphic objects including mixed contentobjects) based on specified compositional balance and color criteria. Insome of these embodiments, images are indexed in accordance with modelsof their respective distributions of visual weight and color. Images areretrieved based on comparisons of their associated visual weight andcolor based indices with the parameters of the compositional balance andcolor driven image queries.

Some embodiments are able to generate compositional balance and colordriven queries from analyses of the distributions of visual weight andcolor in a document and a specified compositional balance objective. Inthis way, these embodiments may be used, for example, in digitalpublishing application environments to automatically retrieve one ormore images that have colors that harmonize with a document underconstruction and that satisfy a compositional balance objective for thedocument.

II. Overview

FIG. 1 shows an embodiment of a compositional balance and color drivencontent retrieval system 10 that includes a modeling engine 12, a searchengine 14, and a user interface 16. The modeling engine 12 builds arespective index 18 for each of the images 20 in a collection. Theimages 20 may be stored in one or more local or remote image databases.Each of the indices 18 typically is a pointer to a respective one of theimages 20. The search engine 14 receives search parameters from the userinterface 16, constructs image queries from the received parameters,compares the image queries to the indices 18, and returns to the userinterface 16 ones of the indices 18 that are determined to match theimage queries. The user interface 16 allows a user 22 to interactivelyspecify search parameters to the search engine 14, browse the searchresults (e.g., thumbnail versions of the matching images), and view onesof the images that are associated to the matching indices returned bythe search engine 12.

FIG. 2 shows an embodiment of a compositional balance and color drivencontent retrieval method that is implemented by the compositionalbalance and color driven content retrieval system 10 to enable acompositional balance and color driven content retrieval of images fromthe one or more local or remote image databases.

The modeling engine 12 determines for each of the images 20 a respectivemodel of visual weight in the image and a respective model of color inthe image (FIG. 2, block 23). In this process, the modeling engine 12typically extracts features (or attributes) from each image 20 andconstructs the respective visual weight model and the respective colormode from the extracted features. The modeling engine 12 creates foreach of the images 20 a respective index 18 from parameters of therespective visual weight and color models and associates the respectiveindex to the corresponding image. The modeling engine 12 may store theindices 18 in a database separate from the images (as shown in FIG. 1)or it may store the indices with metadata that is associated withcorresponding ones of the images 20. The modeling engine 12 typicallyperforms the visual weight and color modeling of the images 20 as anoffline process.

The search engine 14 generates an image query from a target visualweight distribution and a target color template (FIG. 2, block 24). Insome embodiments, the compositional balance and color driven contentretrieval system 10 infers the target visual weight distribution and thetarget color template automatically from an analysis of a document beingconstructed by the user and a specified compositional balance objectivefor the document. In other embodiments, the compositional balance andcolor driven content retrieval system 10 receives from the userinterface 16 a direct specification by the user 22 of the target visualweight distribution and the target color template for the images to beretrieved by the system 10.

The compositional balance and color driven content retrieval system 10calculate for each of the images a respective score from the imagequery, the respective visual weight model, and the respective colormodel (FIG. 3, block 26) and retrieves at least one of the images from adatabase based on the respective scores (FIG. 2, block 28). In thisprocess, the search engine 14 compares the image query to the indices 18and returns to the user interface 16 ones of the indices 18 that aredetermined to match the image queries. The search engine 14 ranks theindices 18 based on a scoring function that produces values indicativeof the level of match between the image query and the respective indices18, which define the respective models of visual weight and color in theimages 20. The user 22 may request the retrieval of one or more of theimages 20 associated to the results returned by the search engine 14. Inresponse, the user interface 16 (or some other application) retrievesthe requested images from the one or more local or remote imagedatabases. The user interface 16 typically queries the one or moredatabases using ones of the indices returned by the search engine 14corresponding to the one or more images requested by the user 22.

III. Compositional Balance

Compositional balance refers to a quality of a composition (or layout)of objects in a document. In particular, compositional balance refers tothe degree to which the visual weight distribution of the objects in thedocument conforms to a compositional objective.

Visual weight (also referred to as “optical weight” or “dominance”) ofan object refers to the extent to which the object stands out in aparticular composition. The visual weight typically is affected by theobject's shape, color, and size. In some embodiments, the visual weightof an object is defined as its area times its optical density.

Common compositional objectives include symmetrical balance,asymmetrical balance, and centered balance.

Symmetrical balance gives a composition harmony, which gives a feelingof permanence and stability. One type of symmetrical balance isbilateral symmetry (or axial symmetry), which is characterized by oneside of a composition mirroring the other. Examples of bilateralsymmetry include left-right bilateral symmetry and top-bottom bilateralsymmetry. FIG. 3A shows an example of a composition of objects that ischaracterized by left-right symmetrical balance. Another type ofsymmetrical balance is radial symmetry, which is characterized by thecomposition being mirrored along both horizontal and vertical axes.

Asymmetrical balance gives a composition contrast, which createsinterest. Asymmetrical balance typically is achieved by laying outobjects of unequal visual weight about a point (referred to as the“fulcrum”) in the composition such that objects having higher visualweight are closer to the fulcrum than objects that have lower visualweight. The fulcrum may correspond to the center (i.e., the true center)of a document, but it more commonly corresponds to a visual center (alsoreferred to as the “optical center”) of the document. As shown in FIG.3B, the visual center 30 of a document 32 typically is displaced fromthe true center 34 of the document 32. The visual center commonly isdisplaced from the true center toward the top of the document a distancethat is approximately 12.5% (or one-eighth) of the length of thevertical dimension 36 of the document. One type of asymmetrical balanceis centered asymmetrical balance, which is characterized by anarrangement of objects of unequal weight that are balanced about afulcrum located at a central point (typically the visual center) in adocument. FIG. 3C shows an example of a composition of objects that ischaracterized by centered asymmetrical balance.

A composition is center balanced when the center of visual weight of theobjects coincides with the visual center of the document in which theobjects are composed. The objects in the composition shown in FIG. 3Care center balanced.

IV. Color Harmony

Color harmony refers to color combinations (typically referred to as“color schemes”) that have been found to be pleasing to the human eye.Typically, the relationships of harmonic colors are described in termsof their relative positions around a “color wheel”, which shows a set ofcolors arranged around the circumference of a circle.

FIG. 4 shows an exemplary color wheel 38 that includes twelve colors.Complementary colors are located opposite each other on the color wheel(e.g., colors A and G are complementary colors). Split complementarycolors include includes a main color and the two colors on each side ofits complementary color on the opposite side of the color wheel (e.g.,if color A is the main color, the split complementary colors are colorsF and H). Related or analogous colors are located next to each other onthe color wheel (e.g., colors A and B are related colors). Monochromaticcolors are colors with the same hue but different tones, values, andsaturation. Monochromatic colors are represented by a single respectivecolor in the color wheel 38.

V. Segmenting an Mage

In the illustrated embodiments, the models of visual weight and color inthe images 20 are generated based on a region- (or object-) basedprocessing of the images 20. In general, the images 20 may be segmentedin a wide variety of different ways.

FIG. 5 is a block diagram of an exemplary embodiment of a method ofsegmenting an input image by extracting color patches in a way thatmaintains edges and detail regions.

In accordance with the method of FIG. 5, the modeling engine 12 accessesimage data of the input image being processed (FIG. 5, block 110). Insome embodiments, the image data are the color values (e.g., RGB values)of image forming elements (e.g., pixels) in the input image. In someembodiments, the modeling engine 12 may convert the image data to adesired color space (e.g., the CIE-Lab color space) before proceeding tothe next processing stage.

The modeling engine 12 quantizes the image data (FIG. 5, block 112). Inthis process, the input image is quantized in accordance with aquantization table (or color palette). In one embodiment, lexicalquantization is performed, for example, using one or more of the lexicalquantization methods described in U.S. patent application Ser. No.11/259,597, filed Oct. 25, 2005. In this process, individual imageforming elements of the input image are associated with one of aplurality of lexical color names. Lexical quantization allows for adiscrete outcome permitting filtering of non-consistent colors within acolor patch or region. The result of the quantization process is a setof sparsely quantized images.

The modeling engine 12 performs color morphological processing of thequantized image data (FIG. 5, stage 114). This process may include Plevels of morphological processing (filtering) at different resolutions,where P has a positive integer value greater than zero. The output 116of the morphological processing stage 114 identifies a plurality ofregions of the input image. The constituent image forming elements ineach of these regions have a common characteristic, such as a consistentcolor corresponding to one of the lexical color names in thequantization table.

The modeling engine 12 performs region/label processing of the inputimage based on the output 116 of the morphological processing stage 114(FIG. 5, block 118). In the course of the region/label processing, theregions are labeled using lexical color names according to theconsistent colors of the respective regions. In addition, some of theregions that are identified by the morphological processing of step S44may be merged. For example, regions are merged if the modeling engine 12determines that the regions correspond to a single portion or object ofan original image (e.g., due to a color gradient occurring in theportion or object causing the lexical quantization of the portion orobject to be classified into plural regions). The resulting segmentationmap 119 is used by the modeling engine 12 to produce the visual appealmap, as described in detail below.

Additional details regarding the operation and various implementationsof the color-based segmentation method of FIG. 5 are described in thefollowing references, each of which is incorporated herein by reference:U.S. patent application Ser. No. 11/495,846, filed Jul. 27, 2006; U.S.patent application Ser. No. 11/495,847, Jul. 27, 2006; U.S. patentapplication Ser. No. 11,259,597, filed Oct. 25, 2005; Pere Obrador,“Multiresolution Color Patch Extraction,” SPIE Visual Communications andImage Processing, San Jose, Calif., USA, pp. 15-19 (January 2006); andPere Obrador, “Automatic color scheme picker for document templatesbased on image analysis and dual problem,” in Proc. SPIE, vol. 6076, SanJose, Calif. (January 2006).

VI. Compositional Balance and Color Driven Content Retrieval

A. Indexing Images for Compositional Balance and Color Driven ContentRetrieval

1. Overview

The modeling engine 12 determines respective models of visual weight andcolor in the images 20 (see FIG. 2, block 23). In this process, themodeling engine 12 typically extracts features from each image 20 andconstructs respective models of visual weight and color in the imagefrom the extracted features. In the embodiments described in detailbelow, the modeling engine 12 generates the visual weight model based ona model of image visual appeal that correlates with visual weight. Thecolor model captures spatial and color parameters that enable the searchengine 14 to determine the closeness between the color template definedin the image query and the color morphology in the images 20. In thisway, these embodiments are able to preferentially retrieve visuallyappealing images that meet the compositional balance and color criteriaspecified in the image queries.

2. Producing a Visual Weight Map of an Image

a. Overview

In some embodiments, the visual weight map of an input image is producedfrom a visual appeal map of the input image.

FIG. 6 shows an embodiment of a method by which the modeling engine 12constructs a visual weight model of an input image from a visual appealmap. The input image is an image selected from the collection of images20 that will be indexed by the visual weight indices 18 (see FIG. 1).

In accordance with the method of FIG. 6, the modeling engine 12determines a visual appeal map of the input image (FIG. 6, block 90).The visual appeal map has values that correlate with the perceivedvisual quality or appeal of the corresponding areas of the input image.The modeling engine 12 identifies regions of high visual appeal in theinput image from the visual appeal map (FIG. 6, block 92). The modelingengine 12 constructs a model of visual weight in the input image fromthe identified high visual appeal regions in the input image (FIG. 6,block 94).

FIG. 7 shows various maps that are calculated from an exemplary inputimage 96 in accordance with an embodiment of the method of FIG. 6. Inthe illustrated embodiment, a visual appeal map 98 is constructed from acontrast map 100, a color map 102, and a sharpness map 104. The contrastmap 100 has values that correlate with the levels of contrast in thecorresponding areas of the input image 96. The color map 102 has valuesthat correlate with the levels of colorfulness in the correspondingareas of the input image 96. The sharpness map 104 has values thatcorrelate with the levels of sharpness in the corresponding areas of theinput image 96. The model 106 of visual weight in the input image 96 isconstructed from the visual appeal map 98, as described in detail below.

b. Producing a Visual Appeal Map of an Image

FIG. 8 is a flow diagram of an embodiment of a method of producing avisual appeal map of an image. In accordance with this method, themodeling engine 12 determines a contrast map that includes values of acontrast metric across the input image (FIG. 8, block 120). The modelingengine 12 determines a color map that includes values of a color metricacross the input image (FIG. 8, block 122). The modeling engine 12determines a sharpness map that includes values of a sharpness metricacross the input image (FIG. 8, block 124). The modeling engine 12combines the contrast map, the color map, and the sharpness map toproduce a visual appeal map of the input image (FIG. 8, block 126).

i. Producing a Contrast Map of an Image

In general, the modeling engine 12 may determine the contrast map in anyof a wide variety of different ways.

In some embodiments, the modeling engine 12 calculates a respectivecontrast value for each of the segmented regions of the input image inthe contrast map in accordance with the image contrast quality scoringprocess described in U.S. Pat. No. 5,642,433.

In other embodiments, the modeling engine 12 calculates the respectivecontrast value for each image forming element location i in the contrastmap by evaluating the measure of a root-mean-square contrast metric(C_(RMS,i)) defined in equation (1) for each segmented region W_(i) inthe input image.

$\begin{matrix}{C_{{RMS},i} = \sqrt{\frac{1}{n_{i} - 1} \cdot {\sum\limits_{j \in W_{i}}( {x_{j} - {\overset{\_}{x}}_{i}} )^{2}}}} & (1)\end{matrix}$

where n_(i) is the number of image forming elements in the region W_(i),x_(j); is the normalized gray-level value of image forming element j inregion W_(i), x_(j) has a value 0≦x_(i)≦1, and

$\begin{matrix}{{\overset{\_}{x}}_{i} = {\frac{1}{n_{i}} \cdot {\sum\limits_{j \in W_{i}}x_{j}}}} & (2)\end{matrix}$

In some embodiments, the modeling engine 12 computes the contrastmeasures Ω_(r,contrast) for each region in the contrast map byevaluating the contrast measure defined in equation (3) for eachcorresponding region W_(i) in the input image.

Ω_(i,contrast)=1 if L_(r,σ)>100

1+L_(i,σ)/100 if L_(r,Ω)≦100  (3)

where L_(i,σ) is the respective variance of the luminance in the regionW_(i) in the input image.

ii. Producing a Color Map of an Image

In general, the modeling engine 12 may determine the colorfulness map inany of a wide variety of different ways. In some embodiments, themodeling engine 12 calculates the respective color value for each of thesegmented regions i in the color map in accordance with the color metricdefined in equation (4):

M _(i,c)=σ_(i,ab)+0 37μ_(i,ab)  (4)

In equation (4), the parameter σ_(i,ab) is the trigonometric length ofthe standard deviation in the ab plane of the Lab color spacerepresentation of the segmented region i in the input image. Theparameter μ_(i,ab) is the distance of the center of gravity in the abplane to the neutral color axis in the Lab color space representation ofthe segmented region i in the input image.

iii. Producing a Sharpness Map of an Image

(a) Overview

FIG. 9 shows an embodiment of a method by which the modeling engine 12produces a sharpness map of an input image 130. FIG. 10 shows thevarious maps that are calculated in accordance with the method of FIG.9.

In accordance with the method of FIG. 9, the modeling engine 12determines an initial sharpness map 132 that includes values of asharpness metric across the input image 130 (FIG. 9, block 134). Themodeling engine 12 corrects the sharpness values in the initialsharpness map 132 based on a contrast map 136 of the input image 130 toproduce a contrast-corrected sharpness map 138 (FIG. 9, block 140). Themodeling engine 12 filters the contrast-corrected sharpness map 138 toproduce a filtered sharpness map 142 (FIG. 9, block 144). The modelingengine 12 morphologically processes the filtered sharpness map 142 toproduce a morphologically-processed sharpness map 146 (FIG. 9, block148). The modeling engine 12 combines the morphologically-processedsharpness map 146 with a segmentation map 150 of the input image 130 andthe contrast-corrected sharpness map 138 to produce a region-basedsharpness map 152 (FIG. 9, block 154).

(b) Determining an Initial Sharpness Map (FIG. 9, Block 134)

The modeling engine 12 may determine the initial sharpness map 132 inany of a wide variety of different ways. In some embodiments, themodeling engine 12 determines the initial sharpness map 132 inaccordance with a noise-robust sharpness estimation process. In anexemplary one of these embodiments, the modeling engine 12 computes afour-level Laplacian multiresolution pyramid from the input image 130and combines the four resolution levels of the Laplacian pyramid toproduce the initial sharpness map 132 with values that are resistant tohigh-frequency noise in the input image 130.

(c) Contrast-Correcting the Initial Sharpness Map (FIG. 9, Block 140)

The contrast map 136 that is used to correct the initial sharpness map132 may be calculated in accordance with one of the contrast mapcalculation methods described above. In this process, the modelingengine 12 calculates a respective contrast map for each of threedifferent sliding window sizes (e.g., 3×3, 7×7, and 11×11) and combinesthese multiresolution contrast maps to form the contrast map 136. Insome embodiments, the modeling engine 12 combines the multiresolutioncontrast maps by selecting the maximum value of the contrast maps ateach image forming location in the input image as the contrast value forthe corresponding location in the contrast map 136. In some embodiments,the modeling engine 12 also performs a morphological dilation on theresult of combining the three multiresolution contrast maps. In oneexemplary embodiment, the morphological dilation is performed with adilation factor of 3.

The modeling engine 12 uses the contrast map 136 to correct the initialsharpness map 132. In this process, the modeling engine 12 reduces thesharpness values in areas of the sharpness map that correspond to areasof high contrast in the contrast map 136. In some embodiments, themodeling engine 12 multiplies the sharpness values by differentsharpness factors depending on the corresponding contrast values. Insome of these embodiments, the contrast-corrected sharpness valuesS_(corrected) in the contrast-corrected sharpness map 138 are calculatedfrom the initial sharpness values S_(initial) based on the contrastvalue C at the corresponding image forming value location as follows:

If C<Φ,

then, S _(corrected) =S _(initial)·(1−α·(C−Φ))

else S _(corrected) =S _(inital) ·β·e ^(−γ·(C−Φ))

where Φ is an empirically determined contrast threshold value, and α andγ are empirically determined parameter values. In one exemplaryembodiment, Φ=50, α=0.0042, β=0.8, and γ=0.024 In some embodiments, thevalues of S_(corrected) are truncated at 255.

(d) Filtering the Contrast-Corrected Sharpness Map (FIG. 9, Block 144)

The modeling engine 12 typically filters the contrast-correctedsharpness map 138 using an edge-preserving smoothing filter to produce afiltered sharpness map 142. This process further distinguishes the sharpregions from the blurred regions. In some embodiments, the modelingengine 12 filters the contrast-corrected sharpness map 138 with abilateral Gaussian filter. In one exemplary embodiment, the bilateralGaussian filter has a window size of 5×5 pixels, a closeness functionstandard deviation σ_(i)=10, and a similarity function standarddeviation σ_(s)=1.

(e) Morphologically Processing the Filtered Sharpness Map (FIG. 9, Block148)

The modeling engine 12 morphologically processes the filtered sharpnessmap 142 to produce a dense morphologically-processed sharpness map 146.In some embodiments, the modeling engine 12 sequentially performs themorphological operations of closing, opening, and erosion on thefiltered sharpness map 142. In one exemplary embodiment, the modelingengine 12 performs these morphological operations with the followingparameters: the closing operation is performed with a closing parameterof 7; the opening operation is performed with an opening parameter of 3;and the erosion operation is performed with an erosion parameter of 5.

(f) Producing the Region-Based Sharpness Map (FIG. 9, Block 154)

The modeling engine 12 combines the morphologically-processed sharpnessmap 146 with a segmentation map 150 of the input image 130 to produce aregion-based sharpness map 152, which is calculated in accordance withthe image segmentation process described above in § V (see FIG. 5). Inthis process, the modeling engine 12 assigns a sharpness value(sharpnessValue_(i)) to each of the regions i in the segmentation map150 based on the sharpness values that are specified in themorphologically-processed sharpness map 146 for the region. Thesharpness value that is assigned to a particular region of theregion-based sharpness map 152 depends on a weighted accumulation ofsharpness values of the image forming elements in the correspondingregion of the morphologically-processed sharpness map 146. The weightsdepend on a multi-tiered thresholding of the sharpness values in themorphologically processed sharpness map 146, where higher sharpnessvalues are weighted more than lower sharpness values to the accumulatedsharpness value assigned to the region. The accumulated weightedsharpness value for each region is averaged over the number of imageforming elements in the region that contributed to the accumulatedvalue. In some embodiments, the modeling engine 12 also detects highlytextured regions in the morphologically-processed sharpness map 146 andreduces the average accumulated weighted sharpness values in thedetected highly textured regions.

iv. Producing a Visual Appeal Map from a Combination of the ContrastMap, the Color Map, and the Sharpness Map

The modeling engine 12 combines the contrast map, the color map, and thesharpness map to produce a visual appeal map of the input image (seeFIG. 15, block 126). The contrast map, the color map, and the sharpnessmap are combined in an additive fashion, since there may be areas withhigh frequency content (higher sharpness and contrast) but lowcolorfulness, and vice-versa, with low frequencies, but highly colorful.Both cases are captured in the scoring function described below. In someembodiments a respective value for each of the segmented regions i inthe visual appeal map is calculated in accordance with the processdefined in connection with equations (5) and (6).

If sharpnessDensity_(i)<sharpDensityThres then

$\begin{matrix}{{imageAppealMap}_{j \in {region}_{i}} = {{finalSharpnessMap}_{i} + \frac{{colorful}_{i}}{A + {B \cdot {sharpnessDensity}_{i}}} + \frac{{contrast}_{i}}{C + {D \cdot {sharpnessDensity}_{i}}}}} & (5)\end{matrix}$

If sharpnessDensity_(i)≧sharpDensityThres then

$\begin{matrix}{{imageAppealMap}_{j \in {region}_{i}} = {{finalSharpnessMap}_{i} + {\frac{1}{E}{colorful}_{i}} + {\frac{1}{F}{contrast}_{i}}}} & (6)\end{matrix}$

where the parameters sharpDensityThres, A, B, C, D, E, and F haveempirically determined values. In this process, the parametersharpnessDensity is the percentage of area with sharp objects within aregion. In some embodiments, the sharpnessDensity for each region i iscalculated in accordance with equation (7).

$\begin{matrix}{{sharpnessDensity}_{i} = {\frac{1}{n_{i}} \cdot {\sum\limits_{j \in {region}_{i}}\{ \begin{matrix}{1,} & {{{if}\mspace{14mu} {rawSharpnessMap}_{j}} > {rawSharpnessThreshold}} \\{0,} & {{{if}\mspace{14mu} {rawSharpnessMap}_{j}} \leq {rawSharpnessThreshold}}\end{matrix} }}} & (7)\end{matrix}$

where rawSharpnessMap_(j) is the sharpness value of the image formingelement j in the region i.

v. Producing a Model of Visual Weight in an Image from a Visual AppealMap of the Image

FIG. 11 shows an embodiment of a method by which the modeling engine 12produces a model of visual weight in an image from a visual appeal mapof the image. FIG. 12 shows various maps that are calculated inaccordance with an embodiment of the method of FIG. 11.

In accordance with the method of FIG. 1, the modeling engine 12thresholds the visual appeal map 98 to produce a thresholded visualappeal map 158 (FIG. 11, block 160). In some embodiments, the modelingengine 12 thresholds the values in the visual appeal map 98 with athreshold that is set to 50% of the maximum value in the visual appealmap. In this process, the modeling engine 12 produce a binary visualappeal map 158 with values of 255 at image forming element locationswhere the values of the corresponding image forming elements in thevisual appeal map 98 are above the threshold and values of 0 at theremaining image forming element locations.

The modeling engine 12 calculates a centroid of visual weight from thethresholded visual appeal map 158 (FIG. 11, block 162). In someembodiments, the modeling engine 12 calculates the image centroid byweighting the horizontal and vertical coordinates in the image with thevisual appeal values A_(i) associated with those coordinates.

$\begin{matrix}{x_{{image}\text{-}{centroid}} = {100 \cdot \frac{\sum\limits_{i}{x_{i} \cdot A_{i}}}{D_{{image} - H} \cdot {\sum\limits_{i}A_{i}}}}} & (8) \\{y_{{image}\text{-}{centroid}} = {100 \cdot \frac{\sum\limits_{i}{y_{i} \cdot A_{i}}}{D_{{image} - V} \cdot {\sum\limits_{i}A_{i}}}}} & (9)\end{matrix}$

where x_(i) and y_(i) are the x-coordinate and the y-coordinate of imageforming element i in the image, A_(i) is the visual appeal value ofpixel i, and D_(image-H) and D_(image-V) are the horizontal and verticaldimensions of the image.

The modeling engine 12 determines a horizontal spread and a verticalspread of the identified regions of high visual appeal about thecalculated centroid to produce a model 164 of visual weight in the inputimage (FIG. 11, block 166). In some embodiments, the horizontal andvertical spreads (σ_(image-H), σ_(image-V)) correspond to the standarddistributions of the visual appeal values Ai about the centroid alongthe horizontal and vertical dimensions of the image.

$\begin{matrix}{\sigma_{{image} - H} = {\frac{100}{D_{{image} - H}} \cdot \sqrt[2]{\frac{\sum\limits_{i}^{Z}\lbrack {( {x_{i} - x_{{image}\text{-}{centroid}}} ) \cdot A_{i}} \rbrack^{2}}{Z \cdot {\sum\limits_{i}^{Z}A_{i}^{2}}}}}} & (10) \\{\sigma_{{image} - H} = {\frac{100}{D_{{image} - V}} \cdot \sqrt{\frac{\sum\limits_{i}^{Z}\lbrack {( {y_{i} - y_{{image}\text{-}{centroid}}} ) \cdot A_{i}} \rbrack^{2}}{Z \cdot {\sum\limits_{i}^{Z}A_{i}^{2}}}}}} & (11)\end{matrix}$

where Z is the number of image forming elements in the document.

The modeling engine 12 creates a respective index 18 from the parameters{x_(image-centroid), y_(image-centroid), σ_(image-H), σ_(image-V)} ofeach of the visual weight models and associates the respective index tothe corresponding image. The modeling engine 12 may store the indices 18in a database that is separate from the images 20 (as shown in FIG. 1)or it may store the indices with metadata that is associated with thecorresponding ones of the images 20. The modeling engine 12 typicallyperforms the visual weight modeling process as an offline process.

Other embodiments of the modeling engine 12 may produce a model of thevisual weight distribution in an image from a visual appeal map of theimage in ways that are different from the method described above. Forexample, in some embodiments, the modeling engine 12 may produce a modelof image visual weight from a Gaussian mixture model approximation ofthe visual appeal map 98. In these embodiments, the parameters of theGaussian mixture models may be used as the visual weight indices 18 forone or more of the images 20.

3. Producing a Model of Color in an Image

FIG. 13 shows an embodiment of a method of producing a model of colorfor each of the images 20. In accordance with this method, the modelingengine 12 models the regions in the respective segmented image for eachof the input images 20 (FIG. 13, block 151). In some embodiments, therespective segmented image is produced from the input image inaccordance with the color segmentation process described above in § V(see FIG. 5). For each of the input images 20, the modeling engine 12produces a respective color model from the respective modeled regions(FIG. 13, block 153).

FIG. 14 shows an embodiment of a method by which the modeling engine 12models the regions into which the input image is segmented (FIG. 13,block 151). In accordance with this method, the modeling engine 12calculates for each region a respective centroid (FIG. 14, block 155), arespective average color (FIG. 14, block 157), and a respective patchsize (FIG. 14, block 159). In some embodiments, the search engine 44calculates the respective centroid of each region by weighting thehorizontal and vertical coordinates in the region with the luminancevalues associated with those coordinates in accordance with equations(12) and (13).

$\begin{matrix}{x_{{region}\text{-}{centrold}} = {100 \cdot \frac{\sum\limits_{i}{x_{i} \cdot L_{i}}}{D_{{image} - H} \cdot {\sum\limits_{i}L_{i}}}}} & (12) \\{y_{{region}\text{-}{centroid}} = {100 \cdot \frac{\sum\limits_{i}{y_{i} \cdot L_{i}}}{D_{{image} - V} \cdot {\sum\limits_{i}L_{i}}}}} & (13)\end{matrix}$

In equations (12) and (13), x_(i) and y_(i) are the x-coordinate and they-coordinate of image forming element i in the region, D_(image-H) andD_(image-V) are the image's horizontal and vertical dimensions, andL_(i) is the luminance value of image forming element i. In accordancewith equations (12) and (13), the search engine 44 calculates therespective centroid of each region as a percentage of the image'shorizontal and vertical dimensions. In some exemplary embodiments, thepatch size of a region is a count of the number of image formingelements in the region.

FIG. 15 shows an embodiment of a method by which the modeling engine 12produces a respective color model from the respective regions that aremodeled in the input image (FIG. 13, block 153). In accordance with thismethod, the modeling engine 12 calculates a histogram of the averagecolors of the regions (FIG. 15, block 161). The modeling engine 12selects the largest color bins covering a minimum proportion (e.g., 90%)of the total color areas (i.e., non-gray areas) of the input image (FIG.15, block 163). The modeling engine 12 produces the respective colormodel from the regions having average colors in the selected color bins(FIG. 15, block 165).

FIG. 16A shows a segmented image 167 that was produced from an exemplaryinput image in accordance with the color segmentation process describedabove in § V (see FIG. 5). FIG. 16B shows a representation of a colormodel 169 that was produced from the segmented image 167 in accordancewith the method of FIG. 13. In FIG. 16B, the regions are modeled bycircles having centers that coincide with the centroids of thecorresponding regions in the segmented image 167 and having areas thatencompass a number of image forming elements corresponding to the patchsizes of the corresponding regions.

Additional details regarding the operation and various implementationsof the color modeling methods of FIGS. 13-15 are described in PereObrador, “Automatic color scheme picker for document templates based onimage analysis and dual problem,” in Proc. SPIE, vol. 6076, San Jose,Calif. (January 2006).

B. Generating Image Queries for Compositional Balance and Color DrivenContent Retrieval

1. Overview

As explained above, the search engine 14 generates an image query thatis used to retrieve at least one of the images from a database based oncomparisons of the image query with respective ones of the visual weightand color models of the images 20.

FIG. 17 shows an embodiment of a method by which an embodiment of thesearch engine 14 generates a visual weight query. In accordance withthis method, the search engine 14 determines a target visual weightdistribution and a target color template (FIG. 17, block 40). The searchengine 14 then generates an image query from the specification of thetarget visual weight distribution and the target color template (FIG.17, block 42).

2. Document-Based Image Query Generation

a. Overview

In some embodiments, the compositional balance and color driven contentretrieval system 10 infers a visual weight model corresponding to thetarget visual weight distribution and a color model corresponding to atarget color template automatically from an analysis of a document beingconstructed by the user and a specified compositional balance objectivefor the document.

FIG. 18 shows an embodiment 44 of the search engine 14 that generates avisual weight and color based query 46 from a document and acompositional balance objective that are specified by the user 22through the user interface 16. The document typically is stored in alocal or remote computer-readable storage device 48 that is accessibleby the user interface 16 and the search engine 44.

This embodiment of the search engine 14 has particular applicability toan application environment in which the user 22 is constructing adocument and wishes to incorporate in the document an image thatbalances the other objects in the document in a way that achieves aparticular compositional balance objective and that has colors thatachieve a specified color harmony objective (e.g., affine,complementary, split complementary, triadic). In this case, the searchengine 44 determines a model of the current visual weight distributionin the document and a model of the color in the document. The searchengine 44 uses the visual weight and color models of the document toform an image query that targets images having visual weightdistributions and colors that complement current state of the documentin ways that meet the user's compositional balance and color objectives.

b. Constructing a Target Visual Weight Distribution from a Document

FIG. 19 shows an embodiment of a method by which the search engine 44generates a target visual weight distribution from a model of the visualweight distribution in a document. In accordance with this method, thesearch engine 44 calculates a centroid of visual weight in the document(FIG. 19, block 50). The search engine 44 determines a horizontal spreadand a vertical spread of the visual weight about the calculated centroid(FIG. 19, block 52). The search engine 44 generates a target visualweight distribution from the calculated centroid and the determinedhorizontal and vertical spreads (FIG. 19, block 54).

FIGS. 20-22 show an illustration of the operation of the search engine44 in accordance with the method of FIG. 19 in the specific context ofan exemplary document and an exemplary compositional balance objectivethat are specified by the user 22.

FIG. 22 shows an example of a document 56 that has a plurality ofobjects 58-70 that are arranged in a current compositional layout. Inthis example, the user 22 wants to insert an image in the areademarcated by the dashed circle 72. Through the user interface 16, theuser 22 submits to the search engine 44 a request for a set of one ormore images that have respective visual weight distributions thatcomplement the current visual weight distribution in the document 56 toachieve a composition that has a left-right symmetrical balance.

In response to the user's request, the search engine 44 calculates acentroid of visual weight in the document (FIG. 19, block 50). In someembodiments, the search engine 44 calculates the document centroid(x_(doc-centroid), y_(doc-centroid)) as a percentage of the document'shorizontal and vertical dimensions (D_(doc-H),D_(doc-V)) in accordancewith equations (14) and (15):

$\begin{matrix}{x_{{doc}\text{-}{centroid}} = {100 \cdot \frac{\sum\limits_{j}{x_{j} \cdot E_{j}}}{D_{{doc} - H} \cdot {\sum\limits_{j}E_{j}}}}} & (14) \\{y_{{doc}\text{-}{centroid}} = {100 \cdot \frac{\sum\limits_{j}{y_{j} \cdot E_{j}}}{D_{{doc} - V} \cdot {\sum\limits_{j}E_{j}}}}} & (15)\end{matrix}$

where (x_(i),y_(i)) are the coordinates of the centroid of object j, andE_(j) is the number of image forming elements (e.g., pixels) in objectj. In some embodiments, the search engine 44 calculates the documentcentroid by weighting the horizontal and vertical coordinates in thedocument with the luminance values associated with those coordinates inaccordance with equations (16) and (17).

$\begin{matrix}{x_{{doc}\text{-}{centroid}} = {100 \cdot \frac{\sum\limits_{i}{x_{i} \cdot L_{i}}}{D_{{doc} - H} \cdot {\sum\limits_{i}L_{i}}}}} & (16) \\{y_{{doc}\text{-}{centroid}} = {100 \cdot \frac{\sum\limits_{i}{y_{i} \cdot L_{i}}}{D_{{doc} - V} \cdot {\sum\limits_{i}L_{i}}}}} & (17)\end{matrix}$

In these equations, x_(i) and y_(i) are the x-coordinate and they-coordinate of image forming element i in the document and L_(i) is theluminance value of image forming element i.

The search engine 44 also determines a horizontal spread and a verticalspread of the visual weight about the calculated centroid (FIG. 19,block 52). In some embodiments, the horizontal and vertical spreads(σ_(doc-H), σ_(doc-V)) correspond to the standard deviations of theluminance values about the centroid along the horizontal and verticaldimensions of the document expressed as percentages of the document'shorizontal and vertical dimensions.

$\begin{matrix}{\sigma_{{doc}\text{-}H} = {\frac{100}{D_{{doc}\text{-}H}} \cdot \sqrt{\frac{\sum\limits_{i}^{K}\lbrack {( {x_{i} - x_{{doc}\text{-}{centroid}}} ) \cdot L_{i}} \rbrack^{2}}{K \cdot {\sum\limits_{i}^{K}L_{i}^{2}}}}}} & (18) \\{\sigma_{{doc}\text{-}H} = {\frac{100}{D_{{doc}\text{-}V}} \cdot \sqrt{\frac{\sum\limits_{i}^{K}\lbrack {( {y_{i} - y_{{doc}\text{-}{centroid}}} ) \cdot L_{i}} \rbrack^{2}}{K \cdot {\sum\limits_{i}^{K}L_{i}^{2}}}}}} & (19)\end{matrix}$

where K is the number of image forming elements in the document.

FIG. 21 shows an embodiment of a model 74 of visual weight in thedocument 56 (see FIG. 20). In this embodiment, the visual weight modelis an ellipse that has a centroid coincident with the center of visualweight in the document 56 (i.e., the calculated centroid location(x_(doc-centroid), y_(doc-centroid))) and horizontal and verticaldimensions equal to the horizontal spread and a vertical spread of thevisual weight about the calculated centroid (i.e., σ_(doc-H) andσ_(doc-V)) In other embodiments, the visual weight in the document maybe modeled by a different shape, including but not limited to, forexample, a rectangle, a circle, and a square.

The search engine 44 generates a target visual weight distribution fromthe calculated centroid (x_(doc-centroid), y_(doc-centroid)) and thedetermined horizontal and vertical spreads (σ_(doc-H), σ_(doc-V)) (FIG.19, block 54). In this process, the search engine 44 geometricallytransforms the model of visual weight in the document in accordance withthe compositional balance objective, and produces the target visualweight distribution from attributes of the geometrically transformedvisual weight model.

For example, if the compositional balance objective is left-rightsymmetrical balance, the search engine 44 transforms the visual weightmodel by reflecting the model about an axis parallel to a verticaldimension of the document and extending through a central point (e.g.,the visual center) in the document, as suggested by the arrow 97 in FIG.22. In some embodiments, the search engine 44 transforms the visualweight model by re-computing the horizontal coordinate of the documentcentroid about the central vertical axis 76 (see FIG. 22) in accordancewith equation (20):

x _(query-centroid)=100−x _(doc-centroid)  (20)

The vertical coordinate of the document centroid and the horizontal andvertical visual weight spreads are unchanged. That is,

y_(query-centroid)=y_(doc-centroid)  (21)

σ_(query-H)=σ_(doc-H)  (22)

σ_(query-V)=σ_(doc-V)  (23)

If the compositional balance objective is centered balance, the searchengine 44 transforms the visual weight model by reflecting the modelabout an axis inclined with respect to horizontal and verticaldimensions of the document and extending through a central point (e.g.,the visual center) in the document. In some embodiments, the searchengine 44 transforms the visual weight model by re-computing thehorizontal and vertical coordinates of the document centroid inaccordance with equations (24) and (25):

x _(query-centroid)=100−x _(doc-centroid)  (24)

y _(query-centroid)=100−y _(doc-centroid)  (25)

The search engine 44 constructs the target visual weight distributionfrom the target visual weight distribution parameters{x_(query-centroid), y_(query-centroid), σ_(query-H), σ_(query-V)}. Insome embodiments, these parameters are incorporated into an SQLimplementation of the image query.

b. Constructing a Target Color Template from a Document

FIG. 23 shows an embodiment of a method of constructing the target colortemplate from a document. FIGS. 24A-24C show different color maps thatare produced from the document 56 in accordance with the method of FIG.23.

In accordance with this method, the search engine 44 segments thedocument into regions (FIG. 23, block 79). In some embodiments, thesearch engine 44 processes the document in accordance with the colorsegmentation process described above in § V (see FIG. 5) to segment thedocument into regions. FIG. 24A shows a segmentation map that wasproduced from the document 56 (see FIG. 20) in accordance with the colorsegmentation process of FIG. 5.

The search engine 44 labels each of the regions with a respective color(FIG. 23, block 81). In some embodiments, the search engine 44 labelsthe regions with an average of the lexical color names assigned to theconstituent image forming elements based on the quantization table usedto segment the document into regions (see § V above).

The search engine 44 calculates a respective centroid and a respectivesize for one or more of the labeled regions (FIG. 23, block 83). In someembodiments, the search engine 44 calculates the region centroids inaccordance with the method of FIG. 14 (see equations (12) and (13)). Insome embodiments the region size is a count of the number of imageforming elements in the region. FIG. 24B shows a representation of acolor model that was produced from the segmented image of FIG. 24A,where the regions are modeled by circles having centers that coincidewith the centroids of the corresponding regions in the segmented imageand having areas that encompass a number of image forming elementscorresponding to the patch sizes of the corresponding regions.

The search engine 44 builds the target color template from thecalculated centroids and the calculated sizes (FIG. 23, block 85). Insome embodiments, the search engine 44 builds the target color templatefrom the color model parameters {x_(doc-centroid, region-k), y_(doc)_(—) _(centroid, region-k), Size_(region-k), Color_(ave-region-k)}∀regions_(k). In some embodiments, these parameters are incorporatedinto an SQL implementation of the image query. FIG. 24C shows arepresentation of a color model that was produced from the color modelof FIG. 24B in accordance with the method of FIG. 15.

3. Manual Image Query Generation

In some embodiments, the compositional balance and color driven contentretrieval system 10 receives from the user interface 16 a directspecification by the user 22 of the desired visual weight and colorpalette in the images to be retrieved by the system 10.

FIGS. 24A and 24B show a diagrammatic view of an embodiment 80 of theuser interface 16 that allows the user 22 to specify a target visualweight distribution and color palette for the images that the user wouldlike the search engine 14 to retrieve. The user interface 80 includes aspecification area 82 and a template selection area 84.

The user 22 can specify the target visual weight distribution bydragging a template (e.g., the star template 86) from the templateselection area 84 into the specification area 82 and scaling theselected template to match the user's conception of the target visualweight distribution. In the illustrated embodiment, the specificationarea 82 is configured to allow the user 22 to view an image 88, as shownin FIG. 24A. The user may use the displayed image 88 as a guide forselecting and scaling the selected template to conform to a targetvisual weight distribution matching the perceived visual weightdistribution in the image 88, as shown in FIG. 24B. The final shape,size, and location of the template correspond to the shape, size, andlocation of the target visual weight distribution. In some embodiments,the user interface 80 includes drawing tools that allow the user 22 tosimply draw the shape of the target visual weight distribution withrespect to a designated compositional area presented in thespecification area 82. After the user 22 has completed the specificationof the graphical representation of the target visual weightdistribution, the search engine 14 extracts parameters that define theshape, size, and location of that graphical representation andincorporates the extracted parameters into an image query.

The user 22 can specify the target color template by selecting an image(e.g., image 88) that contains a color palette and color distributionthat the user 22 would like to see in the images retrieved by the searchengine 14 (e.g., the selected image contains a color palette that meetsthe user's color harmonization objective). Alternatively, the user 22may specify the target color template directly by arranging colors on avirtual canvass, where the colors are selected from a virtual colorwheel or the like that is part of an automated color harmonizationsoftware application package. After the user 22 has completed thespecification of the target color template, the search engine 14extracts parameters that define the target color template andincorporates the extracted parameters into an image query.

C. Retrieving Image Content

a. Overview

As explained above, the compositional balance and color driven contentretrieval system 10 retrieves at least one of the images 20 from adatabase based on a respective score that is calculated for each of theimages from the image query, the respective visual weight model, and therespective color model (see FIG. 2, blocks 26 and 28). In this process,the search engine 14 compares the image query to the indices 18 andreturns to the user interface 16 ones of the indices 18 that aredetermined to match the image queries. The search engine 14 ranks theindices 18 based on a scoring function that produces values indicativeof the level of match between the image query and the respective indices18, which define the respective models of visual weight in the images20.

b. Determining a Respective Visual Weight Comparison Value for EachImage

In some embodiments, the search engine 14 calculates for each image i inthe collection of image 20 a visual weight comparison function thatdecreases with increasing spatial distance between the image query andthe respective model of visual weight in the image. In some of theseembodiments, the visual weight comparison function varies inversely withrespect to the distance between the centroid specified in the imagequery and the centroid of the image visual weight model and variesinversely with respect to the respective distance between the horizontaland vertical spreads specified in the image query and the horizontal andvertical spreads of the image visual weight model. Equation (26) definesan exemplary visual weight comparison function of this type:

$\begin{matrix}{{VisualWeightScore}_{i} = \frac{1}{1 + {f( \Delta_{{centroid},i} )} + {g( \Delta_{{spread},i} )}}} & (26)\end{matrix}$

where Δ_(centroid,i) measures the distance between the centroidspecified in the image query and the centroid of the visual weight modelof image i, f( ) is a monotonically increasing function ofΔ_(centroid,i), Δ_(spread,i) measures the distance between thehorizontal and vertical spreads specified in the image query and thehorizontal and vertical spreads of the visual weight model of image i,and g( ) is a monotonically increasing function of Δ_(spread). In someembodiments, Δ_(centroid,i) and Δ_(spread,i) are defined in equations(27) and (28):

$\begin{matrix}{\Delta_{{centroid},i} = \sqrt{( {x_{{image}\mspace{11mu} i\text{-}{centroid}} - x_{{query}\text{-}{centroid}}} )^{2} + ( {y_{{mage}\mspace{14mu} i\text{-}{centroid}} - y_{{query}\text{-}{centroid}}} )^{2}}} & (27) \\{\Delta_{{spread},i} = \sqrt{( {\sigma_{{mage}\mspace{14mu} i\text{-}H} - \sigma_{{query}\text{-}H}} )^{2} + ( {\sigma_{{mage}\mspace{14mu} i\text{-}V} - \sigma_{{query}\text{-}V}} )^{2}}} & (28)\end{matrix}$

In some embodiments, f(Δ_(centroid,i)) is given by:

f(Δ_(centroid,i))=λ·Δ_(centroid,i) ^(ε)  (29)

where λ and ε are empirically determined constants. In some exemplaryembodiments, 1≦λ≦5 and ε=2. In some embodiments, g(Δ_(spread,i)) isgiven by:

g(Δ_(spread,i))=ω·Δ_(spread,i) ^(ψ)  (30)

where ω and ψ are empirically determined constants. In some exemplaryembodiments, 1 1≦ω≦5 and 1≦ψ≦2.

In some embodiments the visual weight comparison function defined inequation (26) may be scaled by a default or user-selected measure ofvisual appeal in accordance with equation (31).

$\begin{matrix}{{VisualWeightScore}_{i} = \frac{Q( M_{i,j} )}{1 + {f( \Delta_{{centroid},i} )} + {g( \Delta_{{spread},i} )}}} & (31)\end{matrix}$

where Q(M_(i,j)) is a quality function of M_(i,j), which is a qualitymap j of image i. The quality map M_(i,j) may correspond to any of themaps described herein, including but not limited to the visual appealmap, the sharpness map, the contrast map, and the color map. In someembodiments, Q(M_(i,j)) is a two-dimensional integral of the quality mapM_(i,j).

c. Determining a Respective Color Comparison Value for Each Image

In some embodiments, the search engine 14 an image-based colorcomparison function (ColorScore_(i)) for each image i in the collectionof the images 20. The color comparison function is based on aregion-based color comparison function that compares each of the regionsu in the target color template with each of the regions v in the colormodel determined for each of the images 20. In some embodiments, thecolor comparison function decreases with increasing spatial distancebetween the regions in the target color template and the regions in theimage color model, decreases with increasing Euclidean distance betweenthe regions in the target color template and the regions in the imagecolor model in a color space (typically the CIE Lab color space), andincreases with the sizes of the target template regions and the imagecolor model region. Equation (32) defines an exemplary region-basedcolor comparison function of this type:

$\begin{matrix}{{ColorComp}_{{uv},i} = \frac{s( {{Size}_{u},{Size}_{v}} )}{{a( \Delta_{{centroid},{uv}} )} \cdot {b( \Delta_{{color},{uv}} )}}} & (32)\end{matrix}$

In equation (27), s( ) is a function of the size (Size_(u)) of thetarget color template region u and the size (Size_(v)) of the imagecolor model region v of image i, a( ) is a function of Δ_(centroid,uv),which measures the spatial distance between the centroid of the targetcolor template region u and the centroid of the image color model regionv, and b( ) is a function of Δ_(color,uv), which measures the Euclideancolor space distance between the centroid of the target color templateregion u and the centroid of the image color model region v of image i.In some embodiments, Δ_(centroid,uv) is calculated in accordance withequation (33):

$\begin{matrix}{\Delta_{{centroid},{uv}} = \sqrt{( {{centroidX}_{u} - {centroidX}_{v}} )^{2} + ( {{centroidY}_{u} - {centroidY}_{v}} )^{2}}} & (33)\end{matrix}$

where (centroidX_(u),centroidY_(v)) is the centroid location of thetarget color template region u and (centroidX_(u),centroidY_(v)) is thecentroid location of the image color model region v. For image queriesthat are designed to retrieve images that the user intends to insertinto a document, Δ_(centroid,uv) measures the spatial distance betweenthe target color template region u and the color model region v for thecandidate image positioned in a designated target location in thedocument, as shown in FIG. 26 where the image color model 169 (see FIG.16B) is inserted into the color model of FIG. 24C that was produced fordocument 56 (see FIG. 20). In some embodiments, Δ_(color,uv) iscalculated in accordance with equation (34):

$\begin{matrix}{\Delta_{{color},{uv}} = \sqrt{( {{aveL}_{u} - {aveL}_{v}} )^{2} + ( {{aveA}_{u} - {aveA}_{v}} )^{2} + ( {{aveB}_{u} - {aveB}_{v}} )^{2}}} & (34)\end{matrix}$

where (aveL_(u),aveA_(u),aveB_(u)) is the average L, a, and b colorvalues of the target color template region u and(aveL_(v),aveA_(v),aveB_(v)) is the average L, a, and b color values ofthe image color model region v of image i.

In some of these embodiments, s( ) is given by equation (35), a( ) isgiven by equation (36), and b( ) is given by equation (37):

s(Size_(u),Size_(v))=(Size_(u)×Size_(v))^(R)  (35)

a(Δ_(centroid,uv))=S+T·(Δ_(centroid,uv))^(W)  (36)

b(Δ_(color,uv))=H+L·(Δ_(color,uv))^(M)  (37)

where R, T, T, W, H, L, and M have empirically determined constantvalues. In one exemplary embodiment, R=0.5, S=T=W=H=L=1, and M=4.

In some embodiments, the image-based color comparison function(ColorScore_(i)) is calculated from the region-based color comparisonfunction (ColorComp_(uv,i)) for each image i in the collection of images20 in accordance with equation (38):

$\begin{matrix}{{ColorScore}_{i} = {\sum\limits_{u \in {document}}{\sum\limits_{v \in {{image}\mspace{14mu} i}}{ColorComp}_{{uv},i}}}} & (38)\end{matrix}$

d. Determining a Respective Score for Each Image

In some embodiments, the search engine 14 calculates the respectivescore (ImageScore_(i)) from an evaluation of a joint scoring functionthat involves a multiplication together of the respective visual weightcomparison value (VisualWeightScore_(i)) and the respective colorcomparison value (ColorScore_(i)), as defined in equation (39).

ImageScore_(i)=φ(VisualWeightScore_(i))·θ(ColorScore_(i))  (39)

where φ( ) is a function of visual weight comparison value(VisualWeightScore_(i)) that was computed for image i and θ( ) is afunction of the color comparison value (ColorScore_(i)) that wascomputed for image i.

In some embodiments, the functions φ( ) and θ( ) are given by equations(40) and (41):

φ(VisualWeightScore_(i))=χ+μ·(VisualWeightScore_(i))^(ν)  (40)

θ(ColorScore_(i))=ρ+ç(VisualWeightScore_(i))^(τ)  (41)

where χ, μ, ν, ρ, ç, and τ are empirically determined constants. In oneexemplary embodiment, χ=ρ=0, μ=ç=1, ν=2, and τ=1. In another exemplaryembodiment, χ=ρ=0, μ=ç=1, ν=1, and τ=0.5.

The search engine 14 identifies one or more of the images 20 that havegreatest likelihood of matching the image query based on the respectiveImageScores_(i) and retrieves the one or more identified images.

In some embodiments, before ranking the images 20 in terms of theirlikelihoods of matching the image query, the search engine 14 adjuststhe respective ImageScores_(i) to reduce likelihoods of matching theimage query to ones of the images 20 having respective scores that meeta high likelihood of match predicate and respective visual weightcomparison values that meet a low likelihood of visual weight matchpredicate. For example, in some exemplary embodiments, the search enginereduces the ImageScore_(i), if the following conditions are met:

ImageScore_(i)>highMatchThreshold  (42)+

φ(VisualWeightScore_(i))<ω_(LVWMS)  (43)

where ω_(LVWMS) is the lowVisualMatchThreshold, and highMatchThresholdand ω_(LVWMS) have empirically determined constant values. In theseembodiments, the search engine 14 also adjusting the respective scoresto reduce likelihoods of matching the image query to ones of the images20 having respective scores that meet the high likelihood of matchpredicate and respective color comparison values that meet a lowlikelihood of color match predicate. For example, in some exemplaryembodiments, the search engine also reduces the ImageScore_(i), if thefollowing conditions are met:

ImageScore_(i)>highMatchThreshold  (44)

θ(ColorScore_(i))<ω_(LCMS)  (45)

where ω_(LCMS) is the lowColorMatchThreshold and has an empiricallydetermined constant value.

In some of these embodiments, if either (i) the conditions defined inequations (42) and (43) are met or (ii) the conditions defined inequations (44) and (45) are met, the search engine 14 sets theImageScores_(i) for these images to a value within the rectangularregion 171 shown in FIG. 27. In this way, these embodiments ensure thatthe search engine 14 will not retrieve extreme images in which one ofthe visual weight contribution to the ImageScore_(i) or the colorcontribution to the ImageScore_(i) is below an empirically determinedlevel needed for an acceptable image.

FIG. 28 shows three different average precision-recall curves in adocument-based image query application environment. Here, precisionindicates how many of the returned images are correct (true) and recallindicates how many of the correct (true) images the search engine 14returns. The precision-recall curve 181 measures the performance of thesearch engine 14 when only color model parameters are used in the imagescoring function, the precision-recall curve 183 measures theperformance of the search engine 14 when only visual weight modelparameters are used in the image scoring function, and theprecision-recall curve 185 measures the performance of the search engine14 when the joint visual weight and color image scoring functiondescribed above is used by the search engine 14. FIG. 28 illustrates theimproved search engine performance that results from the use of thejoint scoring function, which captures isolated high quality regions inthe visual quality map that visually balance the document, along withthe color tonalities that fulfill that desired analogous color harmony.

V. Exemplary Architecture of the Compositional Balance and Color DrivenContent Retrieval System

Embodiments of the compositional balance and color driven contentretrieval system 10 may be implemented by one or more discrete modules(or data processing components) that are not limited to any particularhardware, firmware, or software configuration. In the illustratedembodiments, the modules may be implemented in any computing or dataprocessing environment, including in digital electronic circuitry (e.g.,an application-specific integrated circuit, such as a digital signalprocessor (DSP)) or in computer hardware, firmware, device driver, orsoftware. In some embodiments, the functionalities of the modules arecombined into a single data processing component. In some embodiments,the respective functionalities of each of one or more of the modules areperformed by is a respective set of multiple data processing components.

In some implementations, process instructions (e.g., machine-readablecode, such as computer software) for implementing the methods that areexecuted by the embodiments of the compositional balance and colordriven content retrieval system 10, as well as the data is generates,are stored in one or more machine-readable media. Storage devicessuitable for tangibly embodying these instructions and data include allforms of non-volatile computer-readable memory, including, for example,semiconductor memory devices, such as EPROM, EEPROM, and flash memorydevices, magnetic disks such as internal hard disks and removable harddisks, magneto-optical disks, DVD-ROM/RAM, and CD-ROM/RAM.

In general, embodiments of the compositional balance and color drivencontent retrieval system 10 may be implemented in any one of a widevariety of electronic devices, including desktop computers, workstationcomputers, and server computers.

FIG. 29 shows an embodiment of a computer system 180 that can implementany of the embodiments of the compositional balance and color drivencontent retrieval system 10 that are described herein. The computersystem 180 includes a processing unit 182 (CPU), a system memory 184,and a system bus 186 that couples processing unit 182 to the variouscomponents of the computer system 180. The processing unit 182 typicallyincludes one or more processors, each of which may be in the form of anyone of various commercially available processors. The system memory 184typically includes a read only memory (ROM) that stores a basicinput/output system (BIOS) that contains start-up routines for thecomputer system 60 and a random access memory (RAM). The system bus 66may be a memory bus, a peripheral bus or a local bus, and may becompatible with any of a variety of bus protocols, including PCI, VESA,Microchannel, ISA, and EISA. The computer system 60 also includes apersistent storage memory 68 (e.g., a hard drive, a floppy drive, a CDROM drive, magnetic tape drives, flash memory devices, and digital videodisks) that is connected to the system bus 186 and contains one or morecomputer-readable media disks that provide non-volatile or persistentstorage for data, data structures and computer-executable instructions.

A user may interact (e.g., enter commands or data) with the computer 180using one or more input devices 190 (e.g., a keyboard, a computer mouse,a microphone, joystick, and touch pad). Information may be presentedthrough a graphical user interface (GUI) that is displayed to the useron a display monitor 192, which is controlled by a display controller194. The computer system 60 also typically includes peripheral outputdevices, such as speakers and a printer. One or more remote computersmay be connected to the computer system 180 through a network interfacecard (NIC) 196.

As shown in FIG. 29, the system memory 184 also stores the compositionalbalance and color driven content retrieval system 10, a GUI driver 198,and at least one database 200 containing input data, processing data,and output data. In some embodiments, the compositional balance andcolor driven content retrieval system 10 interfaces with the GUI driver198 and the user input 190 to present a user interface for managing andcontrolling the operation of the compositional balance and color drivencontent retrieval system 10.

VI. Conclusion

The embodiments that are described in detail herein are capable ofretrieving images (e.g., digital photographs, video frames, scanneddocuments, and other image-based graphic objects including mixed contentobjects) based on specified compositional balance and color criteria. Insome of these embodiments, images are indexed in accordance with modelsof their respective distributions of visual weight and color. Images areretrieved based on comparisons of their associated visual weight andcolor based indices with the parameters of the compositional balance andcolor driven image queries.

Some embodiments are able to generate compositional balance and colordriven queries from analyses of the distributions of visual weight andcolor in a document and a specified compositional balance objective. Inthis way, these embodiments may be used, for example, in digitalpublishing application environments to automatically retrieve one ormore images that have colors that harmonize with a document underconstruction and that satisfy a compositional balance objective for thedocument.

Other embodiments are within the scope of the claims.

1. A method, comprising: determining for each image in a collection ofimages a respective model of visual weight in the image and a respectivemodel of color in the image; generating an image query from a targetvisual weight distribution and a target color template; calculating foreach of the images a respective score from the image query, therespective visual weight model, and the respective color model; andretrieving at least one of the images from a database based on therespective scores.
 2. The method of claim 1, wherein the determiningcomprises for each of the images identifying areas of the image highestin visual appeal, and building the respective model of visual weight inthe image to approximate a distribution of the identified areas of theimage.
 3. The method of claim 1, wherein the determining comprises foreach of the images segmenting the image into regions, and labeling eachof the regions with a respective color.
 4. The method of claim 3,wherein the determining additionally comprises for each of the imagescalculating a respective centroid and a respective size for ones of thelabeled regions, and building the respective color model from thecalculated centroids and the calculated sizes.
 5. The method of claim 1,wherein the calculating comprises for each of the images calculating therespective score from a respective visual weight comparison value and arespective color comparison value, the respective visual weightcomparison value compares the target visual weight distribution and therespective visual weight model of the image, and the color comparisonvalue compares the target color template and the respective color modelof the image.
 6. The method of claim 5, wherein the calculatingcomprises for each of the images calculating the respective visualweight comparison value from a measure of distance between the targetvisual weight distribution and the respective visual weight model of theimage, and calculating the respective color comparison value from ameasure of distance between the target color template and the respectivecolor model of the image.
 7. The method of claim 5, wherein thecalculating comprises for each of the images calculating the respectivescore from an evaluation of a joint scoring function that involves amultiplication together of the respective visual weight comparison valueand the respective color comparison value.
 8. The method of claim 5,wherein: the retrieving comprises identifying one or more of the imageshaving greatest likelihood of matching the image query based on therespective scores and retrieving the one or more identified images; andthe calculating comprises adjusting the respective scores to reducelikelihoods of matching the image query to ones of the images havingrespective scores that meet a high likelihood of match predicate andrespective visual weight comparison values that meet a low likelihood ofvisual weight match predicate, and adjusting the respective scores toreduce likelihoods of matching the image query to ones of the imageshaving respective scores that meet the high likelihood of matchpredicate and respective color comparison values that meet a lowlikelihood of color match predicate.
 9. The method of claim 1, whereinthe generating comprises constructing the target visual weightdistribution from a model of visual weight in a document, wherein theconstructing comprises calculating a center of visual weight in thedocument and determining the model of visual weight in the documentbased on the calculated center of visual weight.
 10. The method of claim9, wherein the generating comprises producing the image query from themodel of visual weight in the document in accordance with acompositional balance objective for the document, and the producingcomprises geometrically transforming the model of visual weight in thedocument in accordance with the compositional balance objective toproduce the target visual weight distribution.
 11. The method of claim1, wherein the generating comprises constructing the target colortemplate from a document, wherein the constructing comprises segmentingthe document into regions, labeling each of the regions with arespective color, calculating a respective centroid and a respectivesize for one or more of the labeled regions, and building the targetcolor template from the calculated centroids and the calculated sizes.12. The method of claim 11, wherein the calculating comprises for eachof the images calculating the respective score from a respective measureof distance between the target color template and the respective colormodel of the image positioned in a designated target location in thedocument.
 13. A machine readable medium storing machine-readableinstructions causing a machine to perform operations comprising:determining for each image in a collection of images a respective modelof visual weight in the image and a respective model of color in theimage; generating an image query from a target visual weightdistribution and a target color template; calculating for each of theimages a respective score from the image query, the respective visualweight model, and the respective color model; and retrieving at leastone of the images from a database based on the respective scores. 14.The machine readable medium of claim 13, wherein, for each of theimages, the machine-readable instructions cause the machine to performoperations comprising segmenting the image into regions, labeling eachof the regions with a respective color calculating a respective centroidand a respective size for ones of the labeled regions, and building therespective color model from the calculated centroids and the calculatedsizes.
 15. The machine readable medium of claim 13, wherein, for each ofthe images, the machine-readable instructions cause the machine toperform operations comprising calculating the respective score from arespective visual weight comparison value and a respective colorcomparison value, wherein the respective visual weight comparison valuecompares the target visual weight distribution and the respective visualweight model of the image, and the color comparison value compares thetarget color template and the respective color model of the image. 16.The machine readable medium of claim 15, wherein, for each of theimages, the machine-readable instructions cause the machine to performoperations comprising calculating the respective visual weightcomparison value from a measure of distance between the target visualweight distribution and the respective visual weight model of the image,and calculating the respective color comparison value from a measure ofdistance between the target color template and the respective colormodel of the image.
 17. The machine readable medium of claim 15,wherein, for each of the images, the machine-readable instructions causethe machine to perform operations comprising calculating the respectivescore from an evaluation of a joint scoring function that involves amultiplication together of the respective visual weight comparison valueand the respective color comparison value.
 18. An apparatus, comprising:a memory; a modeling engine operable to determine for each image in acollection of images a respective model of visual weight in the imageand a respective model of color in the image; a search engine operableto generate an image query from a target visual weight distribution anda target color template, the search engine being additionally operableto calculate for each of the images a respective score from the imagequery, the respective visual weight model, and the respective colormodel; and a user interface application operable to retrieve at leastone of the images from a database based on the respective scores. 19.The apparatus of claim 18, wherein, for each of the images, the modelingengine is operable to segment the image into regions, label each of theregions with a respective color calculate a respective centroid and arespective size for ones of the labeled regions, and build therespective color model from the calculated centroids and the calculatedsizes.
 20. The apparatus of claim 18, wherein, for each of the images,the search engine is operable to calculate the respective score from arespective visual weight comparison value and a respective colorcomparison value, wherein the respective visual weight comparison valuecompares the target visual weight distribution and the respective visualweight model of the image, and the color comparison value compares thetarget color template and the respective color model of the image. 21.The apparatus of claim 20, wherein, for each of the images, the searchengine is operable to calculate the respective visual weight comparisonvalue from a measure of distance between the target visual weightdistribution and the respective visual weight model of the image, andcalculate the respective color comparison value from a measure ofdistance between the target color template and the respective colormodel of the image.
 22. The apparatus of claim 20, wherein, for each ofthe images, the search engine is operable to calculate the respectivescore from an evaluation of a joint scoring function that involves amultiplication together of the respective visual weight comparison valueand the respective color comparison value.